Entities

Content item

  • Object which represents the content to be enhanced by Apache Stanbol

Content parts

  • Are used to represent the original content as well as transformations of the orig. content

Analysed text

  • Used as content part
  • Describes
    • Structure of the text, such as text-sections, sentencens, chunks and tokens
    • Annotations for the detected text

Enhancer

  • Allows to extract features from passed content
  • Uses enhancement engines based on the called enhancement chain

Enhancement engines

Preprocessing

  • Content type detection
  • Text extraction

Natural Language Processing (NLP)

  • Language detection

    • Adds language annotation (as defined by STANBOL-613) to the metadata of a content-item
  • Sentence detection

    • Adds sentences to the analyed text content part
  • Tokenizer engines

    • Adds tokens to the analyzed text content part
  • Part of speech tagging

    • Zuordnung von Wörtern und Satzzeichen zu Wortarten (Verb, Substantiv etc.)
  • Chunk / prhase detection

    • Adds detected chunks to analyzed text content part
    • Annotate added chunks with the type of the detected phrase
  • Named entity recognition (NER)

    • Writes detected named entities as annotations to the metadata of the content-item
  • Morphological analysis

    • Performes lemmatization (lexikographische Reduktion der Flexionsformen eines Wortes auf eine Grundform)
  • General NLP processing engines

Linking / suggestions

  • Suggestion of entities for features present in the parsed content
  • Provides

    • Type
    • Description
    • Spatial and/or temporal content
    • Links to other entities
  • Named Entity

    • Suggests links to several linked data sources
  • Entityhub

    • Suggests links to entities managed by entity hub, referenced sites or managed sites
  • FST (Finite State Transfer)

    • Links entities indexed in a Solr database
  • Entity co-mention

    • Detects co-mentions of a deteceted entity at a later position
  • DBPedia spotlight annotation engine

    • Includes NLP, Entity Linking and Disambiguation of Entities using DBpedia as knowledge base
  • Geonames

    • Suggests links to geonames.org
  • OpenCalais

    • Integerates services from Open Calais
    • Provides both NER and entity linking
  • Zemanta

    • Integrates services from Zemanta
    • Provides both NER and entity linking
  • Sentiment analyses

    • Engines that perform word/chunk level sentiment classifications on the analyed text
  • Disambiguation

    • Disambiguation (Auflösen von Mehrdeutigkeiten) entities based on contextual information

Postprocessing / other

  • Not yet covered :)

Enhancement structure

  • Represents the state of enhancement based upon the content item and the content parts
  • Gets constantly updated during the enhancement process

General information

Used namespaces

  • fise

    • This is the main namespace of the currently used Enhancement Structure. All custom concepts and properties are defined using this namespace.
  • enhancer

    • This is the main namespace of the Stanbol Enhancer defining concepts such as ContentItem, EnhancementEngine, EnhancementChain …
  • entityhub

    • This is the main namespace of the Stanbol Entityhub component.
  • dc

    • The Dublin Core terms standard is also heavily used by the Stanbol Enhancement Structure. Especially to encode metada data, but also to encode relations between extracted information
  • dppedia-ont

    • Concepts of this Ontology are used to describe the types of "Named Entities" detected in parsed content.
  • skos

    • The SKOS standard is preferable used to describe entries of Thesauri or more generally any type of controlled vocabularies.
  • rdf


In [ ]: